Skip to main content

RL Configuration

RL Configuration Guide (with LiftObj as an example)

This page explains how reinforcement learning (RL) environments are defined and configured in LW-BenchHub, using the LiftObj task as a concrete example. It covers how robot and task RL configs are composed, where the configurations live, and how to train and evaluate with SKRL.

High-level composition

  • An RL environment is composed from:
    • A robot RL variant (actions, frames, sensors) under lw_benchhub/core/robots/<robot>/
    • A task RL environment config (observations, rewards, events, curriculum) under lw_benchhub_rl/<task>/
    • A scene/layout (kitchen variants or USD)
  • Both robot and task RL configs are registered in lw_benchhub_rl/__init__.py using a helper, which also links the SKRL agent config (YAML).

Where things are (LiftObj)

  • Task RL env config (Python): lw_benchhub_rl/lift_obj/lift_obj.py
  • Robot RL variant (Python): lw_benchhub/core/robots/unitree/g1.py (class UnitreeG1HandEnvRLCfg)
  • RL environment registration: lw_benchhub_rl/__init__.py
  • Global RL presets (YAML): configs/rl/skrl/rl_base.yml
  • Task preset (YAML): configs/rl/skrl/g1_liftobj.yaml
  • SKRL agent config (YAML): lw_benchhub_rl/lift_obj/agents/skrl_ppo_cfg.yaml
  • Training/Eval scripts: lw_benchhub/scripts/rl/train.py, lw_benchhub/scripts/rl/play.py, shell wrappers train.sh, eval.sh

Robot RL variant (G1-RL)

The robot RL variant adapts the robot to RL control (e.g., action heads as joint targets, defining end-effector frames). For Unitree G1-Right-Hand:

File: lw_benchhub/core/robots/unitree/g1.py

Key points in UnitreeG1HandEnvRLCfg:

  • Sets robot_name = "G1-RL" for RL usage
  • Replaces teleop actions with RL action head (e.g., JointPositionActionCfg for the right arm joints)
  • Defines ee_frame (FrameTransformer) with targets for TCP and fingertips, used in observations/rewards
  • Sets reward-related joint name patterns for gripper and arm

Effect: The robot provides the correct action/observation interfaces for the RL task and rewards.


Task RL env config (LiftObj)

File: lw_benchhub_rl/lift_obj/lift_obj.py

The LiftObj RL config composes the base task with RL-specific managers.

Commands

@configclass
class CommandsCfg:
object_pose = mdp.UniformPoseCommandCfg(
asset_name="robot",
body_name=MISSING, # set in LiftObjG1RlCfg
resampling_time_range=(5.0, 5.0),
debug_vis=False,
ranges=mdp.UniformPoseCommandCfg.Ranges(
pos_x=(-0.05, 0.05), pos_y=(-0.20, -0.05), pos_z=(0.3, 0.35),
roll=(0.0, 0.0), pitch=(0.0, 0.0), yaw=(0.0, 0.0)
),
)

The task samples a target object pose command object_pose. In LiftObjG1RlCfg, this command's body_name is bound to the robot link, e.g. right_wrist_yaw_link.

Policy Observations

@configclass
class G1LiftObjStateObsCfg(RlBasePolicyObservationCfg):
joint_pos = ObsTerm(func=mdp.joint_pos_rel)
joint_vel = ObsTerm(func=mdp.joint_vel_rel)
target_object_position = ObsTerm(
func=mdp.generated_commands, params={"command_name": "object_pose"}
)
actions = ObsTerm(func=mdp.last_action)
image_hand = ObsTerm(
func=mdp.image_features,
params={"sensor_cfg": SceneEntityCfg("hand_camera"), "data_type": "rgb", "model_name": "resnet18", "model_device": "cuda:0"},
)
rgb_d435 = ObsTerm(
func=mdp.image_features,
params={"sensor_cfg": SceneEntityCfg("d435_camera"), "data_type": "rgb", "model_name": "resnet18", "model_device": "cuda:0"},
)
def __post_init__(self):
self.enable_corruption = True
self.concatenate_terms = True

Notes:

  • Proprioceptive terms use relative joint pos/vel
  • generated_commands injects the sampled object_pose command
  • Two image feature terms are provided (hand camera and D435)
  • Observations are concatenated for the policy

Rewards

@configclass
class RewardsCfg:
reaching_object = RewTerm(func=mdp.object_ee_distance, params={"std": 0.1}, weight=1.0)
lifting_object = RewTerm(func=mdp.object_is_lifted, params={"minimal_height": 0.98}, weight=15.0)
lifting_object_grasped = RewTerm(
func=mdp.object_is_lifted_grasped,
params={"grasp_threshold": 0.08, "velocity_threshold": 0.15},
weight=8.0,
)
gripper_close_action_reward = RewTerm(
func=mdp.gripper_close_action_reward,
weight=1.0,
params={"asset_cfg": SceneEntityCfg("robot", joint_names=["right_hand_.*"])},
)
object_goal_tracking = RewTerm(
func=mdp.object_goal_distance,
params={"std": 0.3, "minimal_height": 0.97, "command_name": "object_pose"},
weight=16.0,
)
action_rate = RewTerm(func=mdp.action_rate_l2, weight=-1e-4)
joint_vel = RewTerm(func=mdp.joint_vel_l2, weight=-1e-4, params={"asset_cfg": SceneEntityCfg("robot")})

Notes:

  • Dense shaping includes proximity (object_ee_distance) and goal tracking (object_goal_distance)
  • Success-related terms check lifting height and near-grasp conditions
  • Regularizers penalize fast actions and joint velocities

Events and Curriculum

@configclass
class EventCfg:
reset_all = EventTerm(func=mdp.reset_scene_to_default, mode="reset")
reset_object_position = EventTerm(
func=mdp.reset_root_state_uniform, mode="reset",
params={"pose_range": {"x": (-0.05, 0.05), "y": (-0.05, 0.05), "z": (0.0, 0.0), "yaw": (0.0, 90.0)},
"velocity_range": {}, "asset_cfg": SceneEntityCfg("object", body_names="apple_11")},
)

@configclass
class CurriculumCfg:
action_rate = CurrTerm(func=mdp.modify_reward_weight, params={"term_name": "action_rate", "weight": -5e-3, "num_steps": 36000})
joint_vel = CurrTerm(func=mdp.modify_reward_weight, params={"term_name": "joint_vel", "weight": -5e-3, "num_steps": 36000})

Compose the RL config

@rl_on(task=LiftObj)
@rl_on(embodiment=G1-RL)
class G1LiftObjStateRL(LwRL):
def __init__(self):
super().__init__()
# define or reuse cfg here
self.rewards_cfg = RewardsCfg()
self.events_cfg = EventCfg()
self.curriculum_cfg = CurriculumCfg()
self.commands_cfg = CommandsCfg()
self.policy_observation_cfg = G1LiftObjStateObsCfg()
self.resample_objects_placement_on_reset = False
self.resample_robot_placement_on_reset = False

def setup_env_config(self, orchestrator):
super().setup_env_config(orchestrator)
# G1 right eef name
orchestrator.task.commands_cfg.object_pose.body_name = "right_wrist_yaw_link"

Terminations

BaseRLEnvCfg includes a default time_out_ termination. Training scripts disable the time limit termination at runtime and handle episode rollouts explicitly.


Registration and presets

RL environment registration

File: lw_benchhub_rl/__init__.py

gym.register(
id="Robocasa-Rl-RobotTaskRL",
entry_point="isaaclab.envs:ManagerBasedRLEnv",
kwargs={
"env_cfg_entry_point": f"{__name__}.lift_obj.lift_obj:G1LiftObjStateRL",
"skrl_cfg_entry_point": f"{lift_obj_agents.__name__}:skrl_ppo_cfg.yaml"
"rsl_rl_cfg_entry_point": f"{lift_obj_agents.__name__}:rsl_rl_ppo_cfg:PPORunnerCfg"
},
disable_env_checker=True,
)

This registers the composed environment and links the SKRL agent YAML.

Global preset: rl_base.yml

File: configs/rl/skrl/rl_base.yml

Important fields:

  • disable_fabric (if false ⇒ env uses fabric; faster sim I/O)
  • num_envs, layout, usd_simplify, robot_scale, first_person_view
  • rl: Specifies the RL config name binding for the robot-task combination.
  • Logging/video controls: video, video_length, video_interval
  • Training controls: seed, distributed, max_iterations
  • ML framework and algorithm selection: ml_framework, algorithm
  • Checkpoint: checkpoint, use_pretrained_checkpoint, real_time
  • enable_cameras: enable cameras in the env (required by Visual variants)
  • variant: RL variant name (e.g., State or Visual), used when registering RL configs

Task preset: g1_liftobj.yaml

File: configs/rl/skrl/g1_liftobj.yaml

Overrides and bindings:

  • _base_: rl_base
  • task: LiftObj
  • robot: G1-RL
  • layout: robocasakitchen-1-8
  • rl: G1LiftObjStateRL
  • num_envs: 200
  • usd_simplify: true
  • variant: State (default to State-based LiftObj)
  • enable_cameras: false (set true for Visual variant)

SKRL agent configuration (LiftObj)

File: lw_benchhub_rl/lift_obj/agents/skrl_ppo_cfg.yaml

Key sections:

  • models: network tops for policy/value. GaussianMixin is used for continuous actions.
  • memory: rollout buffer backend (RandomMemory) with memory_size = -1 to follow agent rollouts.
  • agent (PPO): core hyperparameters.
    • rollouts, learning_epochs, mini_batches
    • discount_factor (gamma), lambda (GAE lambda)
    • learning_rate and scheduler (KLAdaptiveLR with kl_threshold)
    • ratio_clip, value_clip, entropy_loss_scale, value_loss_scale
    • rewards_shaper_scale: multiplies rewards before updates
    • experiment: logging/checkpoint frequency and directory
  • trainer: training driver and timesteps for total steps

Practical tips:

  • Effective batch = num_envs * rollouts (per update); increase with GPU memory.
  • Use KL-based LR scheduler to stabilize large batch PPO.
  • Increase entropy_loss_scale to increase exploration; tune down later.

Training and evaluation

Default run via scripts:

bash train.sh   # uses configs/rl/skrl/g1_liftobj_state.yaml by default
bash eval.sh # evaluate the latest checkpoint

Or directly via Python:

python ./lw_benchhub/scripts/rl/train.py --task_config g1_liftobj_state --headless 
python ./lw_benchhub/scripts/rl/play.py --task_config g1_liftobj_state

Troubleshooting

  • OOM / low FPS: reduce num_envs, disable cameras, set usd_simplify: true.
  • Instability: lower learning_rate, increase learning_epochs, reduce mini_batches.
  • Poor exploration: increase entropy_loss_scale or relax reward shaping initially.
  • Reward plateaus: verify object_pose command ranges, minimal_height thresholds, and camera feature devices.